28 research outputs found

    Requisite Variety in Ethical Utility Functions for AI Value Alignment

    Get PDF
    Being a complex subject of major importance in AI Safety research, value alignment has been studied from various perspectives in the last years. However, no final consensus on the design of ethical utility functions facilitating AI value alignment has been achieved yet. Given the urgency to identify systematic solutions, we postulate that it might be useful to start with the simple fact that for the utility function of an AI not to violate human ethical intuitions, it trivially has to be a model of these intuitions and reflect their variety - whereby the most accurate models pertaining to human entities being biological organisms equipped with a brain constructing concepts like moral judgements, are scientific models. Thus, in order to better assess the variety of human morality, we perform a transdisciplinary analysis applying a security mindset to the issue and summarizing variety-relevant background knowledge from neuroscience and psychology. We complement this information by linking it to augmented utilitarianism as a suitable ethical framework. Based on that, we propose first practical guidelines for the design of approximate ethical goal functions that might better capture the variety of human moral judgements. Finally, we conclude and address future possible challenges.Comment: IJCAI 2019 AI Safety Worksho

    Transdisciplinary AI Observatory -- Retrospective Analyses and Future-Oriented Contradistinctions

    Get PDF
    In the last years, AI safety gained international recognition in the light of heterogeneous safety-critical and ethical issues that risk overshadowing the broad beneficial impacts of AI. In this context, the implementation of AI observatory endeavors represents one key research direction. This paper motivates the need for an inherently transdisciplinary AI observatory approach integrating diverse retrospective and counterfactual views. We delineate aims and limitations while providing hands-on-advice utilizing concrete practical examples. Distinguishing between unintentionally and intentionally triggered AI risks with diverse socio-psycho-technological impacts, we exemplify a retrospective descriptive analysis followed by a retrospective counterfactual risk analysis. Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety. As further contribution, we discuss differentiated and tailored long-term directions through the lens of two disparate modern AI safety paradigms. For simplicity, we refer to these two different paradigms with the terms artificial stupidity (AS) and eternal creativity (EC) respectively. While both AS and EC acknowledge the need for a hybrid cognitive-affective approach to AI safety and overlap with regard to many short-term considerations, they differ fundamentally in the nature of multiple envisaged long-term solution patterns. By compiling relevant underlying contradistinctions, we aim to provide future-oriented incentives for constructive dialectics in practical and theoretical AI safety research

    Immoral Programming: What can be done if malicious actors use language AI to launch ‘deepfake science attacks’?

    Get PDF
    The problem-solving and imitation capabilities of AI are increasing. In parallel, research addressing ethical AI design has gained momentum internationally. However, from a cybersecurity-oriented perspective in AI safety, it is vital to also analyse and counteract the risks posed by intentional malice. Malicious actors could for instance exploit the attack surface of already deployed AI, poison AI training data, sabotage AI systems at the pre-deployment stage or deliberately design hazardous AI. At a time when topics such as fake news, disinformation, deepfakes and, recently, fake science are affecting online debates in the population at large but also specifically in scientific circles, we thematise the following elephant in the room now and not in hindsight: what can be done if malicious actors use AI for not yet prevalent but technically feasible ‘deepfake science attacks’, i.e. on (applied) science itself? Deepfakes are not restricted to audio and visual phenomena, and deepfake text whose impact could be potentiated with regard to speed, scope, and scale may represent an underestimated avenue for malicious actors. Not only has the imitation capacity of AI improved dramatically, e.g. with the advent of advanced language AI such as GPT-3 (Brown et al., 2020), but generally, present-day AI can already be abused for goals such as (cyber)crime (Kaloudi and Li, 2020) and information warfare (Hartmann and Giles, 2020). Deepfake science attacks on (applied) science and engineering – which belong to the class of what we technically denote as scientific and empirical adversarial (SEA) AI attacks (Aliman and Kester, 2021) – could be instrumental in achieving such aims due to socio-psycho-technological intricacies against which science might not be immune. But if not immunity, could one achieve resilience? This chapter familiarises the reader with a complementary solution to this complex issue: a generic ‘cyborgnetic’ defence (GCD) against SEA AI attacks. As briefly introduced in Chapter 4, the term cyborgnet (which is much more general than and not to be confused with the term ‘cyborg’) stands for a generic, substrate-independent and hybrid functional unit which is instantiated e.g. in couplings of present-day AIs and humans. Amongst many others, GCD uses epistemology, cybersecurity, cybernetics, and creativity research to tailor 10 generic strategies to the concrete exemplary use case of a large language model such as GPT-3. GCD can act as a cognitively diverse transdisciplinary scaffold to defend against SEA AI attacks – albeit with specific caveats

    Moral Programming: Crafting a flexible heuristic moral meta-model for meaningful AI control in pluralistic societies

    Get PDF
    Artificial Intelligence (AI) permeates more and more application domains. Its progress regarding scale, speed, and scope magnifies potential societal benefits but also ethically and safety relevant risks. Hence, it becomes vital to seek a meaningful control of present-day AI systems (i.e. tools). For this purpose, one can aim at counterbalancing the increasing problem-solving ability of AI with boundary conditions core to human morality. However, a major problem is that morality exists in a context-sensitive steadily shifting explanatory sphere co-created by humans using natural language – which is inherently ambiguous at multiple levels and neither machine-understandable nor machine-readable. A related problem is what we call epistemic dizziness, a phenomenon linked to the inevitable circumstance that one could always be wrong. Yet, while universal doubt cannot be eliminated from morality, it need not be magnified if the potential/requirement for steady refinements is anticipated by design. Thereby, morality pertains to the set of norms and values enacted at the level of a society, other not nearer specified collectives of persons, or at the level of an individual. Norms are instrumental in attaining the fulfilment of values, the latter being an umbrella term for all that seems decisive for distinctions between right and wrong – a central object of study in ethics. In short, for a meaningful control of AI against the background of the changing contextsensitive and linguistically moulded nature of human morality, it is helpful to craft descriptive and thus sufficiently flexible AI-readable heuristic models of morality. In this way, the problem-solving ability of AI could be efficiently funnelled through these updatable models so as to ideally boost the benefits and mitigate the risks at the AI deployment stage with the conceivable side-effect of improving human moral conjectures. For this purpose, we introduced a novel transdisciplinary framework denoted augmented utilitarianism (AU) (Aliman and Kester, 2019b), which is formulated from a meta-ethical stance. AU attempts to support the human-centred task to harness human norms and values to explicitly and traceably steer AI before humans themselves get unwittingly and unintelligibly steered by the obscurity of AI’s deployment. Importantly, AU is descriptive, non-normative, and explanatory (Aliman, 2020), and is not to be confused with normative utilitarianism. (While normative ethics pertains to ‘what one ought to do’, descriptive ethics relates to empirical studies on human ethical decision-making.) This chapter offers the reader a compact overview of how AU coalesces elements from AI, moral psychology, cognitive and affective science, mathematics, systems engineering, cybernetics, and epistemology to craft a generic scaffold able to heuristically encode given moral frameworks in a machine-readable form. We thematise novel insights and also caveats linked to advanced AI risks yielding incentives for future work

    Facing Immersive “Post-Truth” in AIVR?

    No full text
    In recent years, prevalent global societal issues related to fake news, fakery, misinformation, and disinformation were brought to the fore, leading to the construction of descriptive labels such as “post-truth” to refer to the supposedly new emerging era. Thereby, the (mis-)use of technologies such as AI and VR has been argued to potentially fuel this new loss of “ground-truth”, for instance, via the ethically relevant deepfakes phenomena and the creation of realistic fake worlds, presumably undermining experiential veracity. Indeed, unethical and malicious actors could harness tools at the intersection of AI and VR (AIVR) to craft what we call immersive falsehood, fake immersive reality landscapes deliberately constructed for malicious ends. This short paper analyzes the ethically relevant nature of the background against which such malicious designs in AIVR could exacerbate the intentional proliferation of deceptions and falsities. We offer a reappraisal expounding that while immersive falsehood could manipulate and severely jeopardize the inherently affective constructions of social reality and considerably complicate falsification processes, humans may neither inhabit a post-truth nor a post-falsification age. Finally, we provide incentives for future AIVR safety work, ideally contributing to a future era of technology-augmented critical thinking

    Hybrid Cognitive-Affective Strategies for AI Safety

    No full text
    The steadily increasing capabilities in AI systems can have tremendous beneficial impacts on society. However, it is important to simultaneously tackle possible risks that these developments are accompanied by. Therefore, the relatively young field of AI safety has gained international relevance. In parallel, popular media were commenting on whether society should ascribe motifs such as fear or enthusiasm to AI. However, in order to assess the landscape of AI risks and opportunities, it is instead first and foremost of relevance not to be afraid, not to be enthusiastic, but to understand as similarly suggested by Spinoza in the 17th century. In this vein, in this thesis, a transdisciplinary examination is performed to understand how to address possible instantiations of AI risks with the aid of scientifically grounded hybrid cognitive-affective strategies. The identified strategies are “hybrid" due to the fact that AI systems cannot be analyzed in isolation and the nature of human entities as well as the properties of human-machine interactions have to be taken into account within a socio-technological framework and not only addressing unintentional failures but also intentional malice. Consequently, the attribute “cognitive-affective" refers to the inherently affective nature of human cognition. We consider two disjunct sets of systems: Type I and Type II systems. Type II systems are systems that are able to consciously create and understand explanatory knowledge. Conversely, Type I systems are all systems that do not exhibit this ability. All current AIs are of Type I. However, even if Type II AI is non-existent nowadays, its implementation is not physically impossible. Overall, we identify the following non-exhaustive set of 10 tailored hybrid cognitive-affective strategical clusters for AI safety 1) international (meta-)goals, 2) transdisciplinary Type I/II AI safety and related education, 3) socio-technological feedback-loop, 4) integration of affective, dyadic and social information, 5) security measures and ethical adversarial examples research, 6) virtual reality frameworks, 7) orthogonality-based disentanglement of responsibilities, 8) augmented utilitarianism and ethical goal functions, 9) AI self-awareness and 10) artificial creativity augmentation research. In the thesis, we also introduce the so-called AI safety paradox stating, figuratively speaking, that value alignment and control represent conjugate requirements. In theory, with a Type II AI, a mutual value alignment might be achievable via a co- construction of novel values, however, at the cost of its predictability. Conversely, it is possible to build Type I AI systems that are controllable and predictable, but they would not exhibit a sufficient understanding of human morality. Nevertheless, AI safety can be addressed by a cybersecurity oriented and risk-centered approach reformulating AI safety as a discipline which proactively addresses AI risks and reactively responds to occurring instantiations of AI risks. In a nutshell, future AI safety requires transdisciplinarily conceived and scientifically grounded dynamics combining proactive error-prediction and reactive error-correction within a socio-technological feedback-loop together with the cognizance that it is first of relevance not to be afraid, not to be enthusiastic, but to understand – that the price of security is eternal creativity

    Hybrid Cognitive-Affective Strategies for AI Safety

    No full text
    The steadily increasing capabilities in AI systems can have tremendous beneficial impacts on society. However, it is important to simultaneously tackle possible risks that these developments are accompanied by. Therefore, the relatively young field of AI safety has gained international relevance. In parallel, popular media were commenting on whether society should ascribe motifs such as fear or enthusiasm to AI. However, in order to assess the landscape of AI risks and opportunities, it is instead first and foremost of relevance not to be afraid, not to be enthusiastic, but to understand as similarly suggested by Spinoza in the 17th century. In this vein, in this thesis, a transdisciplinary examination is performed to understand how to address possible instantiations of AI risks with the aid of scientifically grounded hybrid cognitive-affective strategies. The identified strategies are “hybrid" due to the fact that AI systems cannot be analyzed in isolation and the nature of human entities as well as the properties of human-machine interactions have to be taken into account within a socio-technological framework and not only addressing unintentional failures but also intentional malice. Consequently, the attribute “cognitive-affective" refers to the inherently affective nature of human cognition. We consider two disjunct sets of systems: Type I and Type II systems. Type II systems are systems that are able to consciously create and understand explanatory knowledge. Conversely, Type I systems are all systems that do not exhibit this ability. All current AIs are of Type I. However, even if Type II AI is non-existent nowadays, its implementation is not physically impossible. Overall, we identify the following non-exhaustive set of 10 tailored hybrid cognitive-affective strategical clusters for AI safety 1) international (meta-)goals, 2) transdisciplinary Type I/II AI safety and related education, 3) socio-technological feedback-loop, 4) integration of affective, dyadic and social information, 5) security measures and ethical adversarial examples research, 6) virtual reality frameworks, 7) orthogonality-based disentanglement of responsibilities, 8) augmented utilitarianism and ethical goal functions, 9) AI self-awareness and 10) artificial creativity augmentation research. In the thesis, we also introduce the so-called AI safety paradox stating, figuratively speaking, that value alignment and control represent conjugate requirements. In theory, with a Type II AI, a mutual value alignment might be achievable via a co- construction of novel values, however, at the cost of its predictability. Conversely, it is possible to build Type I AI systems that are controllable and predictable, but they would not exhibit a sufficient understanding of human morality. Nevertheless, AI safety can be addressed by a cybersecurity oriented and risk-centered approach reformulating AI safety as a discipline which proactively addresses AI risks and reactively responds to occurring instantiations of AI risks. In a nutshell, future AI safety requires transdisciplinarily conceived and scientifically grounded dynamics combining proactive error-prediction and reactive error-correction within a socio-technological feedback-loop together with the cognizance that it is first of relevance not to be afraid, not to be enthusiastic, but to understand – that the price of security is eternal creativity

    Epistemic defenses against scientific and empirical adversarial AI attacks

    Get PDF
    In this paper, we introduce “scientific and empirical adversarial AI attacks” (SEA AI attacks) as umbrella term for not yet prevalent but technically feasible deliberate malicious acts of specifically crafting AI-generated samples to achieve an epistemic distortion in (applied) science or engineering contexts. In view of possible socio-psycho-technological impacts, it seems responsible to ponder countermeasures from the onset on and not in hindsight. In this vein, we consider two illustrative use cases: the example of AI-produced data to mislead security engineering practices and the conceivable prospect of AI-generated contents to manipulate scientific writing processes. Firstly, we contextualize the epistemic challenges that such future SEA AI attacks could pose to society in the light of broader i.a. AI safety, AI ethics and cybersecurity-relevant efforts. Secondly, we set forth a corresponding supportive generic epistemic defense approach. Thirdly, we effect a threat modelling for the two use cases and propose tailor-made defenses based on the foregoing generic deliberations. Strikingly, our transdisciplinary analysis suggests that employing distinct explanation-anchored, trust-disentangled and adversarial strategies is one possible principled complementary epistemic defense against SEA AI attacks - albeit with caveats yielding incentives for future work

    VR, Deepfakes and Epistemic Security

    No full text
    In recent years, technological advancements in the AI and VR fields have increasingly often been paired with considerations on ethics and safety aimed at mitigating unintentional design failures. However, cybersecurity-oriented AI and VR safety research has emphasized the need to additionally appraise instantiations of intentional malice exhibited by unethical actors at pre- and post-deployment stages. On top of that, in view of ongoing malicious deepfake developments that can represent a threat to the epistemic security of a society, security-aware AI and VR design strategies require an epistemically-sensitive stance. In this vein, this paper provides a theoretical basis for two novel AIVR safety research directions: 1) VR as immersive testbed for a VR-deepfake-aided epistemic security training and 2) AI as catalyst within a deepfake-aided so-called cyborgnetic creativity augmentation facilitating an epistemically-sensitive threat modelling. For illustration, we focus our use case on deepfake text - an underestimated deepfake modality. In the main, the two proposed transdisciplinary lines of research exemplify how AIVR safety to defend against unethical actors could naturally converge toward AIVR ethics whilst counteracting epistemic security threats
    corecore